Yeah. You you have this going behind your ears, the yeah. Yeah. Hmm. No. Just maybe talk about um how you would give me your data file. Yeah. That's um the interface between having the topic segments and calculating the uh information importance of words. Yeah. So I I would need separate files for each segment or just maybe have delimiters inbetween each segment. Okay, yeah, and y then I can make files of that. Alright. No, I don't kno know it, no. Yeah, that's fine. I just I just need a a title for the file, but doesn't matter what it is. Mm. So uh we have then when we are mixing our um values together, is it um a value for each um expression, for each sentence or something? Yeah. Mm-mm. Yeah. Yeah, I would just end up with uh values for each word, so I wouldn't have um any boundaries for segments at all. I just I just would have the words, and then how we um how long each expression would be, I don't know. Yeah, but what is an utterance. Uh, yeah. Okay. So what we what we would fit in into the XML files, would it be uh um a value for each utterance? Or what you Okay. Mm-hmm. So then we would have to fit in more than just one where values um Yeah. But um Mm uh-huh. Yeah, it's better. So we n would need a value label for each of those segments that are not dialogue acts. For all those that point to the actual words. Yeah, that's there. No, above. Yeah. Yeah. So you mean an a an attribute. Yeah. Wouldn't it be easiest to just incorporate a new attribute in in this file? Yeah, I don't know, that's probably not very uh nice but would quite easy, wouldn't it? Yeah. But they display the segments in or that the utterances i in their um user interface as well, where the words are y um displayed. So so they have they have their utterances displayed in their interface. So that's they um access probably this file and then they display the words. So why shouldn't we be able to do that? Uh Yeah, the problem is probably that um I extract all the words and then uh I don't um use an I_D_ or something for it, so it would be difficult to write it back to the right position. It depends on the position, I think. Yeah. Yeah, I I could use another a different approach, though. It Mm yeah. Mm-hmm. Um would you think it would make sense to just take um file files by speaker. It's it doesn't matter what input I give to um to Rainbow. So I just could g use the files as they are. I don't know if that's that would give an output. I don't know. No, it um I think it's not uh versus the whole corpus. It's um you have certain categories, and you measure which words um have the highest information for one category. So it's the categories across each other I think. Um probably yeah. But yeah, it's it's s strange. Yeah. Yeah, but what what the problem is, I don't know exactly. I think the information gain in Rainbow is ordered by the value of the information gain, not I am not sure if I can get the right order and the values. That's the problem. But I um if the order stays the same, it's no problem at all to i just write back again. But if uh it's ordered by information gain, I don't know where the words come from, because it's it has a bag of words representation. Uh yes. Is it is. It was made for text classification. Yeah, that's the problem. Yeah, um what it actually does is that you you put in some documents, and you have several documents per category. Um and you have several categories. And then it measures um which words are typical for a certain t category. And if you get a new document, it will um compare which words are in that new document. And if there are a lot of words that um are typical for one category, it will assign it to that category, and if it's typical for another one, it will assign it to that one. Yeah, it's because you have um y um you have um oth different files. If if this is your directory, you have um um a diagra um, a directory one, two and three. And this represents the category. Everything that's in there is in category one. A Yeah, just split it up somehow, yeah. Yeah. No, I've just um split them up uh somehow um by um there are several documents for um each meeting, and I just put one in each category. So uh it's it's not very sensible at the moment, because I'm waiting for the um topic segments or I'm just yeah. Yeah, yeah, probably. Yeah, maybe it's it's better if I write it myself, because otherwise it's too easy to just split things up into bins, and it it wouldn't be any work at all in terms of programming or something. Alright. Mm-hmm. Mm-hmm. How does it calculate that actually? Okay. Yeah, but that's the sa uh almost I think similar to what I'm doing, because words that are in every class, that are not very informative, but words that are only in one class are are very informative for that class. Oh okay. Alright. Yeah, that's th yeah. For a specific word per per All over the place. And in my case it would have I think it would have different uh values for each category. But in that category it would have the same value at each place. Do you know what I mean? Yeah. Um that would be best, but I I would have to look if that. Um at least it works if there are several categories with each with one document each, but it um yeah. Yeah, that's the questio yeah, I don't know. Mm-hmm. Mm okay. Oh no, I li I th I thought something else that we d um we just split I need somewhere to split, and that um splitting at category boundaries um splitting at topic boundaries would a nice thing to do, rather than just splitting somewhere. So yeah, yeah. Yeah, but it I think it should work for just one document, because it it compares between the categories, and if you just have one document um it still can find out which words are informative for that category and which are not. So if you have just one document in each category, and there are a lot of um occurrences of the word the in each one, so this word will be not very informative across the categories. So it should work for one, but I'm not sure if how exactly it it um calculates everything. Uh I think it's not possible to look that up. Yeah, maybe it's possible to have a list that it's o that is ordered by Yeah, don't know. Mm-hmm. Um I don't know what you mean by that. It's uh Mm-hmm. Oh right. Mm. Alright. Yeah, you you can use um some kind of um truncation maybe if you attach a number to each word and say that it should omit the last part. Uh probably not. No, no. Yeah, maybe I should try something different and just programme it for myself. Because um Mm. Mm-hmm. Mm-hmm. Yeah, but how can we get to know what makes sense as a a function for joining everything together? Might be difficult to find that out. Yeah. Mm-hmm. You you could always find out how many words there are in an utterance, couldn't you? Yeah. So For example. Mm. So but Mm. Mm-hmm. Mm. Yeah, then we sh yeah. We could even yeah, we could even have a look at our different measures, if they um come up with the same same kinds of yeah. Uh I think if I can provide something for the words, as they are It's stated from where to where the segments go. It should be uh should be possible. Yeah, so it must should be possible. Yeah. Mm-hmm. You mean by matching strings. Or what? Yeah, but what I've done is um a parser, an X_M_L_ parser where you can get the start times. So, yeah, that's quite easy, because it's it's an attribute and you just s say that you want the values of those attributes. I think that should be possible. I don't know how long it takes me. But Mm. Is laughter annotated at all? Because you could take that on ours out maybe. If you take the laughter out and then calculate it. Mm. Yeah. Not the moment. Maybe if we meet at the weekend. Oh I don't know. Yep. So uh no, how about the the prototype. If we want to show him something on Monday, we definite have to work together. Some of us at least have to work together to get it running probably. Alright.. Yeah, me too. ... Yeah, and also we don't have to recalculate if just one Okay.